Regularized Policy Iteration with Nonparametric Function Spaces
نویسندگان
چکیده
We study two regularization-based approximate policy iteration algorithms, namely REGLSPI and REG-BRM, to solve reinforcement learning and planning problems in discounted Markov Decision Processes with large state and finite action spaces. The core of these algorithms are the regularized extensions of the Least-Squares Temporal Difference (LSTD) learning and Bellman Residual Minimization (BRM), which are used in the algorithms’ policy evaluation steps. Regularization provides a convenient way to control the complexity of the function space to which the estimated value function belongs and as a result enables us to work with rich nonparametric function spaces. We derive efficient implementations of our methods when the function space is a reproducing kernel Hilbert space. We analyze the statistical properties of REG-LSPI and provide an upper bound on the policy evaluation error and the performance loss of the policy returned by this method. Our bound shows the dependence of the loss on the number of samples, the capacity of the function space, and some intrinsic properties of the underlying Markov Decision Process. The dependence of the policy evaluation bound on the number of samples is minimax optimal. This is the first work that provides such a strong guarantee for a nonparametric approximate policy iteration algorithm.
منابع مشابه
Iterative Regularization for Learning with Convex Loss Functions
We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint or penalization is considered, and generalization is achieved by (early) stopping an empirical iteration. We consider a nonparametric setting, in the framewo...
متن کاملHilbert Space Embeddings of POMDPs
A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes’ rule to these distribution embeddings. Policies and value functions are defin...
متن کاملتحلیل ممیز غیرپارامتریک بهبودیافته برای دستهبندی تصاویر ابرطیفی با نمونه آموزشی محدود
Feature extraction performs an important role in improving hyperspectral image classification. Compared with parametric methods, nonparametric feature extraction methods have better performance when classes have no normal distribution. Besides, these methods can extract more features than what parametric feature extraction methods do. Nonparametric feature extraction methods use nonparametric s...
متن کاملOptimization of Solution Regularized Long-wave Equation by Using Modified Variational Iteration Method
In this paper, a regularized long-wave equation (RLWE) is solved by using the Adomian's decomposition method (ADM) , modified Adomian's decomposition method (MADM), variational iteration method (VIM), modified variational iteration method (MVIM) and homotopy analysis method (HAM). The approximate solution of this equation is calculated in the form of series which its components are computed by ...
متن کاملRegularized Policy Iteration
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by addingL-regularization to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016